6 research outputs found

    Structured clustering representations and methods

    Get PDF
    Rather than designing focused experiments to test individual hypotheses, scientists now commonly acquire measurements using massively parallel techniques, for post hoc interrogation. The resulting data is both high-dimensional and structured, in that observed variables are grouped and ordered into related subspaces, reflecting both natural physical organization and factorial experimental designs. Such structure encodes critical constraints and clues to interpretation, but typical unsupervised learning methods assume exchangeability and fail to account adequately for the structure of data in a flexible and interpretable way. In this thesis, I develop computational methods for exploratory analysis of structured high-dimensional data, and apply them to study gene expression regulation in Parkinson’s (PD) and Huntington’s diseases (HD). BOMBASTIC (Block-Organized, Model-Based, Tree-Indexed Clustering) is a methodology to cluster and visualize data organized in pre-specified subspaces, by combining independent clusterings of blocks into hierarchies. BOMBASTIC provides a formal specification of the block-clustering problem and a modular implementation that facilitates integration, visualization, and comparison of diverse datasets and rapid exploration of alternative analyses. These tools, along with standard methods, were applied to study gene expression in mouse models of neurodegenerative diseases, in collaboration with Dr. Myriam Heiman and Dr. Robert Fenster. In PD, I analyzed cell-type-specific expression following levodopa treatment to study mechanisms underlying levodopa-induced dyskinesia (LID). I identified likely regulators of the transcriptional changes leading to LID and implicated signaling pathways amenable to pharmacological modulation (Heiman, Heilbut et al, 2014). In HD, I analyzed multiple mouse models (Kuhn, 2007), cell-type specific profiles of medium spiny neurons (Fenster, 2011), and an RNA-Seq dataset profiling multiple tissue types over time and across an mHTT allelic series (CHDI, 2015). I found evidence suggesting that altered activity of the PRC2 complex significantly contributes to the transcriptional dysregulation observed in striatal neurons in HD

    Large-scale mapping of human protein–protein interactions by mass spectrometry

    Get PDF
    Mapping protein–protein interactions is an invaluable tool for understanding protein function. Here, we report the first large-scale study of protein–protein interactions in human cells using a mass spectrometry-based approach. The study maps protein interactions for 338 bait proteins that were selected based on known or suspected disease and functional associations. Large-scale immunoprecipitation of Flag-tagged versions of these proteins followed by LC-ESI-MS/MS analysis resulted in the identification of 24 540 potential protein interactions. False positives and redundant hits were filtered out using empirical criteria and a calculated interaction confidence score, producing a data set of 6463 interactions between 2235 distinct proteins. This data set was further cross-validated using previously published and predicted human protein interactions. In-depth mining of the data set shows that it represents a valuable source of novel protein–protein interactions with relevance to human diseases. In addition, via our preliminary analysis, we report many novel protein interactions and pathway associations

    Complex Reorganization and Predominant Non-Homologous Repair Following Chromosomal Breakage in Karyotypically Balanced Germline Rearrangements and Transgenic Integration

    Get PDF
    We defined the genetic landscape of balanced chromosomal rearrangements at nucleotide resolution by sequencing 141 breakpoints from cytogenetically-interpreted translocations and inversions. We confirm that the recently described phenomenon of “chromothripsis” (massive chromosomal shattering and reorganization) is not unique to cancer cells but also occurs in the germline where it can resolve to a karyotypically balanced state with frequent inversions. We detected a high incidence of complex rearrangements (19.2%) and substantially less reliance on microhomology (31%) than previously observed in benign CNVs. We compared these results to experimentally-generated DNA breakage-repair by sequencing seven transgenic animals, and revealed extensive rearrangement of the transgene and host genome with similar complexity to human germline alterations. Inversion is the most common rearrangement, suggesting that a combined mechanism involving template switching and non-homologous repair mediates the formation of balanced complex rearrangements that are viable, stably replicated and transmitted unaltered to subsequent generations

    Next-Generation Sequencing Strategies Enable Routine Detection of Balanced Chromosome Rearrangements for Clinical Diagnostics and Genetic Research

    Get PDF
    The contribution of balanced chromosomal rearrangements to complex disorders remains unclear because they are not detected routinely by genome-wide microarrays and clinical localization is imprecise. Failure to consider these events bypasses a potentially powerful complement to single nucleotide polymorphism and copy-number association approaches to complex disorders, where much of the heritability remains unexplained. To capitalize on this genetic resource, we have applied optimized sequencing and analysis strategies to test whether these potentially high-impact variants can be mapped at reasonable cost and throughput. By using a whole-genome multiplexing strategy, rearrangement breakpoints could be delineated at a fraction of the cost of standard sequencing. For rearrangements already mapped regionally by karyotyping and fluorescence in situ hybridization, a targeted approach enabled capture and sequencing of multiple breakpoints simultaneously. Importantly, this strategy permitted capture and unique alignment of up to 97% of repeat-masked sequences in the targeted regions. Genome-wide analyses estimate that only 3.7% of bases should be routinely omitted from genomic DNA capture experiments. Illustrating the power of these approaches, the rearrangement breakpoints were rapidly defined to base pair resolution and revealed unexpected sequence complexity, such as co-occurrence of inversion and translocation as an underlying feature of karyotypically balanced alterations. These findings have implications ranging from genome annotation to de novo assemblies and could enable sequencing screens for structural variations at a cost comparable to that of microarrays in standard clinical practice

    Sequencing Chromosomal Abnormalities Reveals Neurodevelopmental Loci that Confer Risk across Diagnostic Boundaries

    Get PDF
    SummaryBalanced chromosomal abnormalities (BCAs) represent a relatively untapped reservoir of single-gene disruptions in neurodevelopmental disorders (NDDs). We sequenced BCAs in patients with autism or related NDDs, revealing disruption of 33 loci in four general categories: (1) genes previously associated with abnormal neurodevelopment (e.g., AUTS2, FOXP1, and CDKL5), (2) single-gene contributors to microdeletion syndromes (MBD5, SATB2, EHMT1, and SNURF-SNRPN), (3) novel risk loci (e.g., CHD8, KIRREL3, and ZNF507), and (4) genes associated with later-onset psychiatric disorders (e.g., TCF4, ZNF804A, PDE10A, GRIN2B, and ANK3). We also discovered among neurodevelopmental cases a profoundly increased burden of copy-number variants from these 33 loci and a significant enrichment of polygenic risk alleles from genome-wide association studies of autism and schizophrenia. Our findings suggest a polygenic risk model of autism and reveal that some neurodevelopmental genes are sensitive to perturbation by multiple mutational mechanisms, leading to variable phenotypic outcomes that manifest at different life stages
    corecore